Challenge understanding

Objective

Predict survival on Titanic dataset

Competition Description

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.

One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.

In this challenge, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy. https://www.kaggle.com/c/titanic



In [ ]:

Initial Idea

Load Library Modules
Load Datasets
Explore datasets
Analyse relations between features
Analyse missing values
Analyse features
Prepare for modelling
Modelling
Prepare the prediction for submission

1. Loading Library Modules



In [13]:

    
import warnings
warnings.filterwarnings('ignore')

# SKLearn Model Algorithms
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression , Perceptron

from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC, LinearSVC

# SKLearn ensemble classifiers
from sklearn.ensemble import RandomForestClassifier , GradientBoostingClassifier
from sklearn.ensemble import ExtraTreesClassifier , BaggingClassifier
from sklearn.ensemble import VotingClassifier , AdaBoostClassifier

# SKLearn Modelling Helpers
from sklearn.preprocessing import Imputer , Normalizer , scale
from sklearn.cross_validation import train_test_split , StratifiedKFold
from sklearn.feature_selection import RFECV

# Handle table-like data and matrices
import numpy as np
import pandas as pd

# Visualisation
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
import seaborn as sns

# plot functions
import pltFunctions as pfunc

# Configure visualisations
%matplotlib inline
mpl.style.use( 'ggplot' )
sns.set_style( 'white' )
pylab.rcParams[ 'figure.figsize' ] = 8 , 6

2. Loading Datasets



In [14]:

    
train = pd.read_csv("./input/train.csv")
test    = pd.read_csv("./input/test.csv")



In [15]:

    
#combined = pd.concat([train.drop('Survived',1),test])
#combined = train.append( test, ignore_index = True)
full = train.append( test, ignore_index = True)
del train, test
#train = full[ :891 ]
#combined = combined.drop( 'Survived',1)



In [16]:

    
#print ('Datasets:' , 'combined:' , combined.shape , 'full:' , full.shape , 'train:' , train.shape)

3. Exploring datasets



In [17]:

    
full.head(10)









    Out[17]:







  
    
      
      Age
      Cabin
      Embarked
      Fare
      Name
      Parch
      PassengerId
      Pclass
      Sex
      SibSp
      Survived
      Ticket
    
  
  
    
      0
      22.0
      NaN
      S
      7.2500
      Braund, Mr. Owen Harris
      0
      1
      3
      male
      1
      0.0
      A/5 21171
    
    
      1
      38.0
      C85
      C
      71.2833
      Cumings, Mrs. John Bradley (Florence Briggs Th...
      0
      2
      1
      female
      1
      1.0
      PC 17599
    
    
      2
      26.0
      NaN
      S
      7.9250
      Heikkinen, Miss. Laina
      0
      3
      3
      female
      0
      1.0
      STON/O2. 3101282
    
    
      3
      35.0
      C123
      S
      53.1000
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      0
      4
      1
      female
      1
      1.0
      113803
    
    
      4
      35.0
      NaN
      S
      8.0500
      Allen, Mr. William Henry
      0
      5
      3
      male
      0
      0.0
      373450
    
    
      5
      NaN
      NaN
      Q
      8.4583
      Moran, Mr. James
      0
      6
      3
      male
      0
      0.0
      330877
    
    
      6
      54.0
      E46
      S
      51.8625
      McCarthy, Mr. Timothy J
      0
      7
      1
      male
      0
      0.0
      17463
    
    
      7
      2.0
      NaN
      S
      21.0750
      Palsson, Master. Gosta Leonard
      1
      8
      3
      male
      3
      0.0
      349909
    
    
      8
      27.0
      NaN
      S
      11.1333
      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
      2
      9
      3
      female
      0
      1.0
      347742
    
    
      9
      14.0
      NaN
      C
      30.0708
      Nasser, Mrs. Nicholas (Adele Achem)
      0
      10
      2
      female
      1
      1.0
      237736



In [18]:

    
print(full.isnull().sum())









    



Age             263
Cabin          1014
Embarked          2
Fare              1
Name              0
Parch             0
PassengerId       0
Pclass            0
Sex               0
SibSp             0
Survived        418
Ticket            0
dtype: int64



In [19]:

    
pd.crosstab(full['Pclass'], full['Sex'])



In [20]:

    
print( full.groupby(['Sex','Pclass'])['Age'].mean() )
agedf = full.groupby(['Sex','Pclass'])['Age'].mean()
type( agedf )









    



Sex     Pclass
female  1         37.037594
        2         27.499223
        3         22.185329
male    1         41.029272
        2         30.815380
        3         25.962264
Name: Age, dtype: float64






    Out[20]:





pandas.core.series.Series



In [21]:

    
#for age in full:
#    if full['Age'].isnull():
#        print (agedf.where(agedf['Sex'] == full['Sex'])&(agedf['Pclass']==full['Pclass']))



In [22]:

    
def fillMissingAge(dframe):
    dframe['Age'] = dframe['Age'].fillna( dframe['Age'].mean())
    return dframe

def fillMissingFare(dframe):
    dframe['Fare'] = dframe['Fare'].fillna( dframe['Fare'].mean() )
    return dframe



In [23]:

    
full = fillMissingAge(full)
full = fillMissingFare(full)
print(full.isnull().sum())









    



Age               0
Cabin          1014
Embarked          2
Fare              0
Name              0
Parch             0
PassengerId       0
Pclass            0
Sex               0
SibSp             0
Survived        418
Ticket            0
dtype: int64



In [ ]:



In [24]:

    
print(full[full['Embarked'].isnull()])









    



      Age Cabin Embarked  Fare                                       Name  \
61   38.0   B28      NaN  80.0                        Icard, Miss. Amelie   
829  62.0   B28      NaN  80.0  Stone, Mrs. George Nelson (Martha Evelyn)   

     Parch  PassengerId  Pclass     Sex  SibSp  Survived  Ticket  
61       0           62       1  female      0       1.0  113572  
829      0          830       1  female      0       1.0  113572



In [25]:

    
pd.crosstab(full['Embarked'], full['Sex'].where(full['Sex'] == 1))









    Out[25]:



In [26]:

    
full.where((full['Sex']==1) & (full['Pclass']==1)).groupby(['Embarked','Pclass','Parch','SibSp']).size()









    Out[26]:





Series([], dtype: int64)



In [27]:

    
nt=(115+60+291)
pC=115/nt
pQ=60/nt
pS=291/nt
print('Prob C :', pC, 'Prob Q :', pQ ,'Prob S :' , pS)

nC=(30+2+20)
p0C=30/nC
p0Q=2/nC
p0S=20/nC
print('Prob C :', p0C, 'Prob Q :', p0Q ,'Prob S :' , p0S)

print( 'Sum of probabilities')
print('Prob C :', pC+p0C, 'Prob Q :', pQ+p0Q ,'Prob S :' , pS+p0S)









    



Prob C : 0.24678111587982832 Prob Q : 0.12875536480686695 Prob S : 0.6244635193133047
Prob C : 0.5769230769230769 Prob Q : 0.038461538461538464 Prob S : 0.38461538461538464
Sum of probabilities
Prob C : 0.8237041928029052 Prob Q : 0.1672169032684054 Prob S : 1.0090789039286894



In [28]:

    
# Trying S for both  passengers
full['Embarked'].iloc[61] = "S"
full['Embarked'].iloc[829] = "S"



In [29]:

    
print(full.isnull().sum())









    



Age               0
Cabin          1014
Embarked          0
Fare              0
Name              0
Parch             0
PassengerId       0
Pclass            0
Sex               0
SibSp             0
Survived        418
Ticket            0
dtype: int64



In [30]:

    
def fillCabin(dframe):
    dframe[ 'Cabin' ] = dframe['Cabin'].fillna( 'U' )
    dframe[ 'Cabin' ] = dframe[ 'Cabin' ].map( lambda c : c[0] )
    # dummy encoding ...
    dframe = pd.get_dummies( dframe['Cabin'] , prefix = 'Cabin' )
    return dframe



In [31]:

    
print(fillCabin(full))
newDF = fillCabin(full)
full = pd.concat([full, newDF], axis=1)
#full = full.drop('Cabin',1)









    



      Cabin_A  Cabin_B  Cabin_C  Cabin_D  Cabin_E  Cabin_F  Cabin_G  Cabin_T  \
0           0        0        0        0        0        0        0        0   
1           0        0        1        0        0        0        0        0   
2           0        0        0        0        0        0        0        0   
3           0        0        1        0        0        0        0        0   
4           0        0        0        0        0        0        0        0   
5           0        0        0        0        0        0        0        0   
6           0        0        0        0        1        0        0        0   
7           0        0        0        0        0        0        0        0   
8           0        0        0        0        0        0        0        0   
9           0        0        0        0        0        0        0        0   
10          0        0        0        0        0        0        1        0   
11          0        0        1        0        0        0        0        0   
12          0        0        0        0        0        0        0        0   
13          0        0        0        0        0        0        0        0   
14          0        0        0        0        0        0        0        0   
15          0        0        0        0        0        0        0        0   
16          0        0        0        0        0        0        0        0   
17          0        0        0        0        0        0        0        0   
18          0        0        0        0        0        0        0        0   
19          0        0        0        0        0        0        0        0   
20          0        0        0        0        0        0        0        0   
21          0        0        0        1        0        0        0        0   
22          0        0        0        0        0        0        0        0   
23          1        0        0        0        0        0        0        0   
24          0        0        0        0        0        0        0        0   
25          0        0        0        0        0        0        0        0   
26          0        0        0        0        0        0        0        0   
27          0        0        1        0        0        0        0        0   
28          0        0        0        0        0        0        0        0   
29          0        0        0        0        0        0        0        0   
...       ...      ...      ...      ...      ...      ...      ...      ...   
1279        0        0        0        0        0        0        0        0   
1280        0        0        0        0        0        0        0        0   
1281        0        1        0        0        0        0        0        0   
1282        0        0        0        1        0        0        0        0   
1283        0        0        0        0        0        0        0        0   
1284        0        0        0        0        0        0        0        0   
1285        0        0        0        0        0        0        0        0   
1286        0        0        1        0        0        0        0        0   
1287        0        0        0        0        0        0        0        0   
1288        0        1        0        0        0        0        0        0   
1289        0        0        0        0        0        0        0        0   
1290        0        0        0        0        0        0        0        0   
1291        0        0        1        0        0        0        0        0   
1292        0        0        0        0        0        0        0        0   
1293        0        0        0        0        0        0        0        0   
1294        0        0        0        0        0        0        0        0   
1295        0        0        0        1        0        0        0        0   
1296        0        0        0        1        0        0        0        0   
1297        0        0        0        0        0        0        0        0   
1298        0        0        1        0        0        0        0        0   
1299        0        0        0        0        0        0        0        0   
1300        0        0        0        0        0        0        0        0   
1301        0        0        0        0        0        0        0        0   
1302        0        0        1        0        0        0        0        0   
1303        0        0        0        0        0        0        0        0   
1304        0        0        0        0        0        0        0        0   
1305        0        0        1        0        0        0        0        0   
1306        0        0        0        0        0        0        0        0   
1307        0        0        0        0        0        0        0        0   
1308        0        0        0        0        0        0        0        0   

      Cabin_U  
0           1  
1           0  
2           1  
3           0  
4           1  
5           1  
6           0  
7           1  
8           1  
9           1  
10          0  
11          0  
12          1  
13          1  
14          1  
15          1  
16          1  
17          1  
18          1  
19          1  
20          1  
21          0  
22          1  
23          0  
24          1  
25          1  
26          1  
27          0  
28          1  
29          1  
...       ...  
1279        1  
1280        1  
1281        0  
1282        0  
1283        1  
1284        1  
1285        1  
1286        0  
1287        1  
1288        0  
1289        1  
1290        1  
1291        0  
1292        1  
1293        1  
1294        1  
1295        0  
1296        0  
1297        1  
1298        0  
1299        1  
1300        1  
1301        1  
1302        0  
1303        1  
1304        1  
1305        0  
1306        1  
1307        1  
1308        1  

[1309 rows x 9 columns]



In [32]:

    
full









    Out[32]:







  
    
      
      Age
      Cabin
      Embarked
      Fare
      Name
      Parch
      PassengerId
      Pclass
      Sex
      SibSp
      ...
      Ticket
      Cabin_A
      Cabin_B
      Cabin_C
      Cabin_D
      Cabin_E
      Cabin_F
      Cabin_G
      Cabin_T
      Cabin_U
    
  
  
    
      0
      22.000000
      U
      S
      7.2500
      Braund, Mr. Owen Harris
      0
      1
      3
      male
      1
      ...
      A/5 21171
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1
      38.000000
      C
      C
      71.2833
      Cumings, Mrs. John Bradley (Florence Briggs Th...
      0
      2
      1
      female
      1
      ...
      PC 17599
      0
      0
      1
      0
      0
      0
      0
      0
      0
    
    
      2
      26.000000
      U
      S
      7.9250
      Heikkinen, Miss. Laina
      0
      3
      3
      female
      0
      ...
      STON/O2. 3101282
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      3
      35.000000
      C
      S
      53.1000
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      0
      4
      1
      female
      1
      ...
      113803
      0
      0
      1
      0
      0
      0
      0
      0
      0
    
    
      4
      35.000000
      U
      S
      8.0500
      Allen, Mr. William Henry
      0
      5
      3
      male
      0
      ...
      373450
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      5
      29.881138
      U
      Q
      8.4583
      Moran, Mr. James
      0
      6
      3
      male
      0
      ...
      330877
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      6
      54.000000
      E
      S
      51.8625
      McCarthy, Mr. Timothy J
      0
      7
      1
      male
      0
      ...
      17463
      0
      0
      0
      0
      1
      0
      0
      0
      0
    
    
      7
      2.000000
      U
      S
      21.0750
      Palsson, Master. Gosta Leonard
      1
      8
      3
      male
      3
      ...
      349909
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      8
      27.000000
      U
      S
      11.1333
      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
      2
      9
      3
      female
      0
      ...
      347742
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      9
      14.000000
      U
      C
      30.0708
      Nasser, Mrs. Nicholas (Adele Achem)
      0
      10
      2
      female
      1
      ...
      237736
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      10
      4.000000
      G
      S
      16.7000
      Sandstrom, Miss. Marguerite Rut
      1
      11
      3
      female
      1
      ...
      PP 9549
      0
      0
      0
      0
      0
      0
      1
      0
      0
    
    
      11
      58.000000
      C
      S
      26.5500
      Bonnell, Miss. Elizabeth
      0
      12
      1
      female
      0
      ...
      113783
      0
      0
      1
      0
      0
      0
      0
      0
      0
    
    
      12
      20.000000
      U
      S
      8.0500
      Saundercock, Mr. William Henry
      0
      13
      3
      male
      0
      ...
      A/5. 2151
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      13
      39.000000
      U
      S
      31.2750
      Andersson, Mr. Anders Johan
      5
      14
      3
      male
      1
      ...
      347082
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      14
      14.000000
      U
      S
      7.8542
      Vestrom, Miss. Hulda Amanda Adolfina
      0
      15
      3
      female
      0
      ...
      350406
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      15
      55.000000
      U
      S
      16.0000
      Hewlett, Mrs. (Mary D Kingcome)
      0
      16
      2
      female
      0
      ...
      248706
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      16
      2.000000
      U
      Q
      29.1250
      Rice, Master. Eugene
      1
      17
      3
      male
      4
      ...
      382652
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      17
      29.881138
      U
      S
      13.0000
      Williams, Mr. Charles Eugene
      0
      18
      2
      male
      0
      ...
      244373
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      18
      31.000000
      U
      S
      18.0000
      Vander Planke, Mrs. Julius (Emelia Maria Vande...
      0
      19
      3
      female
      1
      ...
      345763
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      19
      29.881138
      U
      C
      7.2250
      Masselmani, Mrs. Fatima
      0
      20
      3
      female
      0
      ...
      2649
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      20
      35.000000
      U
      S
      26.0000
      Fynney, Mr. Joseph J
      0
      21
      2
      male
      0
      ...
      239865
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      21
      34.000000
      D
      S
      13.0000
      Beesley, Mr. Lawrence
      0
      22
      2
      male
      0
      ...
      248698
      0
      0
      0
      1
      0
      0
      0
      0
      0
    
    
      22
      15.000000
      U
      Q
      8.0292
      McGowan, Miss. Anna "Annie"
      0
      23
      3
      female
      0
      ...
      330923
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      23
      28.000000
      A
      S
      35.5000
      Sloper, Mr. William Thompson
      0
      24
      1
      male
      0
      ...
      113788
      1
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      24
      8.000000
      U
      S
      21.0750
      Palsson, Miss. Torborg Danira
      1
      25
      3
      female
      3
      ...
      349909
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      25
      38.000000
      U
      S
      31.3875
      Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...
      5
      26
      3
      female
      1
      ...
      347077
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      26
      29.881138
      U
      C
      7.2250
      Emir, Mr. Farred Chehab
      0
      27
      3
      male
      0
      ...
      2631
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      27
      19.000000
      C
      S
      263.0000
      Fortune, Mr. Charles Alexander
      2
      28
      1
      male
      3
      ...
      19950
      0
      0
      1
      0
      0
      0
      0
      0
      0
    
    
      28
      29.881138
      U
      Q
      7.8792
      O'Dwyer, Miss. Ellen "Nellie"
      0
      29
      3
      female
      0
      ...
      330959
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      29
      29.881138
      U
      S
      7.8958
      Todoroff, Mr. Lalio
      0
      30
      3
      male
      0
      ...
      349216
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      1279
      21.000000
      U
      Q
      7.7500
      Canavan, Mr. Patrick
      0
      1280
      3
      male
      0
      ...
      364858
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1280
      6.000000
      U
      S
      21.0750
      Palsson, Master. Paul Folke
      1
      1281
      3
      male
      3
      ...
      349909
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1281
      23.000000
      B
      S
      93.5000
      Payne, Mr. Vivian Ponsonby
      0
      1282
      1
      male
      0
      ...
      12749
      0
      1
      0
      0
      0
      0
      0
      0
      0
    
    
      1282
      51.000000
      D
      S
      39.4000
      Lines, Mrs. Ernest H (Elizabeth Lindsey James)
      1
      1283
      1
      female
      0
      ...
      PC 17592
      0
      0
      0
      1
      0
      0
      0
      0
      0
    
    
      1283
      13.000000
      U
      S
      20.2500
      Abbott, Master. Eugene Joseph
      2
      1284
      3
      male
      0
      ...
      C.A. 2673
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1284
      47.000000
      U
      S
      10.5000
      Gilbert, Mr. William
      0
      1285
      2
      male
      0
      ...
      C.A. 30769
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1285
      29.000000
      U
      S
      22.0250
      Kink-Heilmann, Mr. Anton
      1
      1286
      3
      male
      3
      ...
      315153
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1286
      18.000000
      C
      S
      60.0000
      Smith, Mrs. Lucien Philip (Mary Eloise Hughes)
      0
      1287
      1
      female
      1
      ...
      13695
      0
      0
      1
      0
      0
      0
      0
      0
      0
    
    
      1287
      24.000000
      U
      Q
      7.2500
      Colbert, Mr. Patrick
      0
      1288
      3
      male
      0
      ...
      371109
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1288
      48.000000
      B
      C
      79.2000
      Frolicher-Stehli, Mrs. Maxmillian (Margaretha ...
      1
      1289
      1
      female
      1
      ...
      13567
      0
      1
      0
      0
      0
      0
      0
      0
      0
    
    
      1289
      22.000000
      U
      S
      7.7750
      Larsson-Rondberg, Mr. Edvard A
      0
      1290
      3
      male
      0
      ...
      347065
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1290
      31.000000
      U
      Q
      7.7333
      Conlon, Mr. Thomas Henry
      0
      1291
      3
      male
      0
      ...
      21332
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1291
      30.000000
      C
      S
      164.8667
      Bonnell, Miss. Caroline
      0
      1292
      1
      female
      0
      ...
      36928
      0
      0
      1
      0
      0
      0
      0
      0
      0
    
    
      1292
      38.000000
      U
      S
      21.0000
      Gale, Mr. Harry
      0
      1293
      2
      male
      1
      ...
      28664
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1293
      22.000000
      U
      C
      59.4000
      Gibson, Miss. Dorothy Winifred
      1
      1294
      1
      female
      0
      ...
      112378
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1294
      17.000000
      U
      S
      47.1000
      Carrau, Mr. Jose Pedro
      0
      1295
      1
      male
      0
      ...
      113059
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1295
      43.000000
      D
      C
      27.7208
      Frauenthal, Mr. Isaac Gerald
      0
      1296
      1
      male
      1
      ...
      17765
      0
      0
      0
      1
      0
      0
      0
      0
      0
    
    
      1296
      20.000000
      D
      C
      13.8625
      Nourney, Mr. Alfred (Baron von Drachstedt")"
      0
      1297
      2
      male
      0
      ...
      SC/PARIS 2166
      0
      0
      0
      1
      0
      0
      0
      0
      0
    
    
      1297
      23.000000
      U
      S
      10.5000
      Ware, Mr. William Jeffery
      0
      1298
      2
      male
      1
      ...
      28666
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1298
      50.000000
      C
      C
      211.5000
      Widener, Mr. George Dunton
      1
      1299
      1
      male
      1
      ...
      113503
      0
      0
      1
      0
      0
      0
      0
      0
      0
    
    
      1299
      29.881138
      U
      Q
      7.7208
      Riordan, Miss. Johanna Hannah""
      0
      1300
      3
      female
      0
      ...
      334915
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1300
      3.000000
      U
      S
      13.7750
      Peacock, Miss. Treasteall
      1
      1301
      3
      female
      1
      ...
      SOTON/O.Q. 3101315
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1301
      29.881138
      U
      Q
      7.7500
      Naughton, Miss. Hannah
      0
      1302
      3
      female
      0
      ...
      365237
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1302
      37.000000
      C
      Q
      90.0000
      Minahan, Mrs. William Edward (Lillian E Thorpe)
      0
      1303
      1
      female
      1
      ...
      19928
      0
      0
      1
      0
      0
      0
      0
      0
      0
    
    
      1303
      28.000000
      U
      S
      7.7750
      Henriksson, Miss. Jenny Lovisa
      0
      1304
      3
      female
      0
      ...
      347086
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1304
      29.881138
      U
      S
      8.0500
      Spector, Mr. Woolf
      0
      1305
      3
      male
      0
      ...
      A.5. 3236
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1305
      39.000000
      C
      C
      108.9000
      Oliva y Ocana, Dona. Fermina
      0
      1306
      1
      female
      0
      ...
      PC 17758
      0
      0
      1
      0
      0
      0
      0
      0
      0
    
    
      1306
      38.500000
      U
      S
      7.2500
      Saether, Mr. Simon Sivertsen
      0
      1307
      3
      male
      0
      ...
      SOTON/O.Q. 3101262
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1307
      29.881138
      U
      S
      8.0500
      Ware, Mr. Frederick
      0
      1308
      3
      male
      0
      ...
      359309
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
    
      1308
      29.881138
      U
      C
      22.3583
      Peter, Master. Michael J
      1
      1309
      3
      male
      1
      ...
      2668
      0
      0
      0
      0
      0
      0
      0
      0
      1
    
  

1309 rows × 21 columns



In [33]:

    
#print( full.where((full['Sex'] == 0) & (full['Pclass'] == 1)).groupby(['Pclass','Sex'])['Age'].mean() )
print( full['Sex'].isnull().sum() )



In [ ]:



In [34]:

    
#byTicket = full.where(full['Cabin'].isnull()).groupby(['Name'])['Ticket']
#byFare = full.where(full['Cabin'].isnull()).groupby(['Pclass'])['Fare']
#byTicket.head(5)
#byFare.head(5)



In [35]:

    
full = pfunc.convertSexToNum(full)
full.head()









    Out[35]:







  
    
      
      Age
      Cabin
      Embarked
      Fare
      Name
      Parch
      PassengerId
      Pclass
      SibSp
      Survived
      ...
      Cabin_A
      Cabin_B
      Cabin_C
      Cabin_D
      Cabin_E
      Cabin_F
      Cabin_G
      Cabin_T
      Cabin_U
      Sex
    
  
  
    
      0
      22.0
      U
      S
      7.2500
      Braund, Mr. Owen Harris
      0
      1
      3
      1
      0.0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
    
    
      1
      38.0
      C
      C
      71.2833
      Cumings, Mrs. John Bradley (Florence Briggs Th...
      0
      2
      1
      1
      1.0
      ...
      0
      0
      1
      0
      0
      0
      0
      0
      0
      1
    
    
      2
      26.0
      U
      S
      7.9250
      Heikkinen, Miss. Laina
      0
      3
      3
      0
      1.0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      1
      1
    
    
      3
      35.0
      C
      S
      53.1000
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      0
      4
      1
      1
      1.0
      ...
      0
      0
      1
      0
      0
      0
      0
      0
      0
      1
    
    
      4
      35.0
      U
      S
      8.0500
      Allen, Mr. William Henry
      0
      5
      3
      0
      0.0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
    
  

5 rows × 21 columns



In [36]:

    
# Naming the Deck accordingly to the Cabin description
# Naming the Deck as U due to unknown Cabin description
full = pfunc.fillDeck(full)

pd.crosstab(full['Deck'], full['Survived'])



In [37]:

    
print(full.isnull().sum())
print("========================================")
print(full.info())









    



Age              0
Cabin            0
Embarked         0
Fare             0
Name             0
Parch            0
PassengerId      0
Pclass           0
SibSp            0
Survived       418
Ticket           0
Cabin_A          0
Cabin_B          0
Cabin_C          0
Cabin_D          0
Cabin_E          0
Cabin_F          0
Cabin_G          0
Cabin_T          0
Cabin_U          0
Sex              0
Deck             0
dtype: int64
========================================
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 22 columns):
Age            1309 non-null float64
Cabin          1309 non-null object
Embarked       1309 non-null object
Fare           1309 non-null float64
Name           1309 non-null object
Parch          1309 non-null int64
PassengerId    1309 non-null int64
Pclass         1309 non-null int64
SibSp          1309 non-null int64
Survived       891 non-null float64
Ticket         1309 non-null object
Cabin_A        1309 non-null uint8
Cabin_B        1309 non-null uint8
Cabin_C        1309 non-null uint8
Cabin_D        1309 non-null uint8
Cabin_E        1309 non-null uint8
Cabin_F        1309 non-null uint8
Cabin_G        1309 non-null uint8
Cabin_T        1309 non-null uint8
Cabin_U        1309 non-null uint8
Sex            1309 non-null int64
Deck           1309 non-null object
dtypes: float64(3), int64(5), object(5), uint8(9)
memory usage: 144.5+ KB
None



In [38]:

    
print(pfunc.featureEng( full ))
full = pfunc.featureEng( full )









    



            Age Cabin Embarked      Fare  \
0     22.000000     U        S    7.2500   
1     38.000000     C        C   71.2833   
2     26.000000     U        S    7.9250   
3     35.000000     C        S   53.1000   
4     35.000000     U        S    8.0500   
5     29.881138     U        Q    8.4583   
6     54.000000     E        S   51.8625   
7      2.000000     U        S   21.0750   
8     27.000000     U        S   11.1333   
9     14.000000     U        C   30.0708   
10     4.000000     G        S   16.7000   
11    58.000000     C        S   26.5500   
12    20.000000     U        S    8.0500   
13    39.000000     U        S   31.2750   
14    14.000000     U        S    7.8542   
15    55.000000     U        S   16.0000   
16     2.000000     U        Q   29.1250   
17    29.881138     U        S   13.0000   
18    31.000000     U        S   18.0000   
19    29.881138     U        C    7.2250   
20    35.000000     U        S   26.0000   
21    34.000000     D        S   13.0000   
22    15.000000     U        Q    8.0292   
23    28.000000     A        S   35.5000   
24     8.000000     U        S   21.0750   
25    38.000000     U        S   31.3875   
26    29.881138     U        C    7.2250   
27    19.000000     C        S  263.0000   
28    29.881138     U        Q    7.8792   
29    29.881138     U        S    7.8958   
...         ...   ...      ...       ...   
1279  21.000000     U        Q    7.7500   
1280   6.000000     U        S   21.0750   
1281  23.000000     B        S   93.5000   
1282  51.000000     D        S   39.4000   
1283  13.000000     U        S   20.2500   
1284  47.000000     U        S   10.5000   
1285  29.000000     U        S   22.0250   
1286  18.000000     C        S   60.0000   
1287  24.000000     U        Q    7.2500   
1288  48.000000     B        C   79.2000   
1289  22.000000     U        S    7.7750   
1290  31.000000     U        Q    7.7333   
1291  30.000000     C        S  164.8667   
1292  38.000000     U        S   21.0000   
1293  22.000000     U        C   59.4000   
1294  17.000000     U        S   47.1000   
1295  43.000000     D        C   27.7208   
1296  20.000000     D        C   13.8625   
1297  23.000000     U        S   10.5000   
1298  50.000000     C        C  211.5000   
1299  29.881138     U        Q    7.7208   
1300   3.000000     U        S   13.7750   
1301  29.881138     U        Q    7.7500   
1302  37.000000     C        Q   90.0000   
1303  28.000000     U        S    7.7750   
1304  29.881138     U        S    8.0500   
1305  39.000000     C        C  108.9000   
1306  38.500000     U        S    7.2500   
1307  29.881138     U        S    8.0500   
1308  29.881138     U        C   22.3583   

                                                   Name  Parch  PassengerId  \
0                               Braund, Mr. Owen Harris      0            1   
1     Cumings, Mrs. John Bradley (Florence Briggs Th...      0            2   
2                                Heikkinen, Miss. Laina      0            3   
3          Futrelle, Mrs. Jacques Heath (Lily May Peel)      0            4   
4                              Allen, Mr. William Henry      0            5   
5                                      Moran, Mr. James      0            6   
6                               McCarthy, Mr. Timothy J      0            7   
7                        Palsson, Master. Gosta Leonard      1            8   
8     Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)      2            9   
9                   Nasser, Mrs. Nicholas (Adele Achem)      0           10   
10                      Sandstrom, Miss. Marguerite Rut      1           11   
11                             Bonnell, Miss. Elizabeth      0           12   
12                       Saundercock, Mr. William Henry      0           13   
13                          Andersson, Mr. Anders Johan      5           14   
14                 Vestrom, Miss. Hulda Amanda Adolfina      0           15   
15                     Hewlett, Mrs. (Mary D Kingcome)       0           16   
16                                 Rice, Master. Eugene      1           17   
17                         Williams, Mr. Charles Eugene      0           18   
18    Vander Planke, Mrs. Julius (Emelia Maria Vande...      0           19   
19                              Masselmani, Mrs. Fatima      0           20   
20                                 Fynney, Mr. Joseph J      0           21   
21                                Beesley, Mr. Lawrence      0           22   
22                          McGowan, Miss. Anna "Annie"      0           23   
23                         Sloper, Mr. William Thompson      0           24   
24                        Palsson, Miss. Torborg Danira      1           25   
25    Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...      5           26   
26                              Emir, Mr. Farred Chehab      0           27   
27                       Fortune, Mr. Charles Alexander      2           28   
28                        O'Dwyer, Miss. Ellen "Nellie"      0           29   
29                                  Todoroff, Mr. Lalio      0           30   
...                                                 ...    ...          ...   
1279                               Canavan, Mr. Patrick      0         1280   
1280                        Palsson, Master. Paul Folke      1         1281   
1281                         Payne, Mr. Vivian Ponsonby      0         1282   
1282     Lines, Mrs. Ernest H (Elizabeth Lindsey James)      1         1283   
1283                      Abbott, Master. Eugene Joseph      2         1284   
1284                               Gilbert, Mr. William      0         1285   
1285                           Kink-Heilmann, Mr. Anton      1         1286   
1286     Smith, Mrs. Lucien Philip (Mary Eloise Hughes)      0         1287   
1287                               Colbert, Mr. Patrick      0         1288   
1288  Frolicher-Stehli, Mrs. Maxmillian (Margaretha ...      1         1289   
1289                     Larsson-Rondberg, Mr. Edvard A      0         1290   
1290                           Conlon, Mr. Thomas Henry      0         1291   
1291                            Bonnell, Miss. Caroline      0         1292   
1292                                    Gale, Mr. Harry      0         1293   
1293                     Gibson, Miss. Dorothy Winifred      1         1294   
1294                             Carrau, Mr. Jose Pedro      0         1295   
1295                       Frauenthal, Mr. Isaac Gerald      0         1296   
1296       Nourney, Mr. Alfred (Baron von Drachstedt")"      0         1297   
1297                          Ware, Mr. William Jeffery      0         1298   
1298                         Widener, Mr. George Dunton      1         1299   
1299                    Riordan, Miss. Johanna Hannah""      0         1300   
1300                          Peacock, Miss. Treasteall      1         1301   
1301                             Naughton, Miss. Hannah      0         1302   
1302    Minahan, Mrs. William Edward (Lillian E Thorpe)      0         1303   
1303                     Henriksson, Miss. Jenny Lovisa      0         1304   
1304                                 Spector, Mr. Woolf      0         1305   
1305                       Oliva y Ocana, Dona. Fermina      0         1306   
1306                       Saether, Mr. Simon Sivertsen      0         1307   
1307                                Ware, Mr. Frederick      0         1308   
1308                           Peter, Master. Michael J      1         1309   

      Pclass  SibSp  Survived      ...      FamilyLarge  TicketType   Title  \
0          3      1       0.0      ...                0           A      Mr   
1          1      1       1.0      ...                0           P     Mrs   
2          3      0       1.0      ...                0           S    Miss   
3          1      1       1.0      ...                0           1     Mrs   
4          3      0       0.0      ...                0           3      Mr   
5          3      0       0.0      ...                0           3      Mr   
6          1      0       0.0      ...                0           1      Mr   
7          3      3       0.0      ...                1           3  Master   
8          3      0       1.0      ...                0           3     Mrs   
9          2      1       1.0      ...                0           2     Mrs   
10         3      1       1.0      ...                0           P    Miss   
11         1      0       1.0      ...                0           1    Miss   
12         3      0       0.0      ...                0           A      Mr   
13         3      1       0.0      ...                1           3      Mr   
14         3      0       0.0      ...                0           3    Miss   
15         2      0       1.0      ...                0           2     Mrs   
16         3      4       0.0      ...                1           3  Master   
17         2      0       1.0      ...                0           2      Mr   
18         3      1       0.0      ...                0           3     Mrs   
19         3      0       1.0      ...                0           2     Mrs   
20         2      0       0.0      ...                0           2      Mr   
21         2      0       1.0      ...                0           2      Mr   
22         3      0       1.0      ...                0           3    Miss   
23         1      0       1.0      ...                0           1      Mr   
24         3      3       0.0      ...                1           3    Miss   
25         3      1       1.0      ...                1           3     Mrs   
26         3      0       0.0      ...                0           2      Mr   
27         1      3       0.0      ...                1           1      Mr   
28         3      0       1.0      ...                0           3    Miss   
29         3      0       0.0      ...                0           3      Mr   
...      ...    ...       ...      ...              ...         ...     ...   
1279       3      0       NaN      ...                0           3      Mr   
1280       3      3       NaN      ...                1           3  Master   
1281       1      0       NaN      ...                0           1      Mr   
1282       1      0       NaN      ...                0           P     Mrs   
1283       3      0       NaN      ...                0           C  Master   
1284       2      0       NaN      ...                0           C      Mr   
1285       3      3       NaN      ...                1           3      Mr   
1286       1      1       NaN      ...                0           1     Mrs   
1287       3      0       NaN      ...                0           3      Mr   
1288       1      1       NaN      ...                0           1     Mrs   
1289       3      0       NaN      ...                0           3      Mr   
1290       3      0       NaN      ...                0           2      Mr   
1291       1      0       NaN      ...                0           3    Miss   
1292       2      1       NaN      ...                0           2      Mr   
1293       1      0       NaN      ...                0           1    Miss   
1294       1      0       NaN      ...                0           1      Mr   
1295       1      1       NaN      ...                0           1      Mr   
1296       2      0       NaN      ...                0           S      Mr   
1297       2      1       NaN      ...                0           2      Mr   
1298       1      1       NaN      ...                0           1      Mr   
1299       3      0       NaN      ...                0           3    Miss   
1300       3      1       NaN      ...                0           S    Miss   
1301       3      0       NaN      ...                0           3    Miss   
1302       1      1       NaN      ...                0           1     Mrs   
1303       3      0       NaN      ...                0           3    Miss   
1304       3      0       NaN      ...                0           A      Mr   
1305       1      0       NaN      ...                0           P    Dona   
1306       3      0       NaN      ...                0           S      Mr   
1307       3      0       NaN      ...                0           3      Mr   
1308       3      1       NaN      ...                0           2  Master   

      Fare_cat  Bad_ticket  Young  Shared_ticket  Ticket_group   Fare_eff  \
0            0        True   True              0             1   7.250000   
1            1       False  False              1             2  35.641650   
2            0       False   True              0             1   7.925000   
3            1       False  False              1             2  26.550000   
4            0        True  False              0             1   8.050000   
5            0        True   True              0             1   8.458300   
6            1       False  False              1             2  25.931250   
7            1        True   True              1             5   4.215000   
8            1        True   True              1             3   3.711100   
9            1       False   True              1             2  15.035400   
10           1       False   True              1             3   5.566667   
11           1       False   True              0             1  26.550000   
12           0        True   True              0             1   8.050000   
13           1        True  False              1             7   4.467857   
14           0        True   True              0             1   7.854200   
15           1       False  False              0             1  16.000000   
16           1        True   True              1             6   4.854167   
17           1       False   True              0             1  13.000000   
18           1        True  False              1             2   9.000000   
19           0       False   True              0             1   7.225000   
20           1       False  False              1             2  13.000000   
21           1       False  False              0             1  13.000000   
22           0        True   True              0             1   8.029200   
23           1       False   True              0             1  35.500000   
24           1        True   True              1             5   4.215000   
25           1        True  False              1             7   4.483929   
26           0       False   True              0             1   7.225000   
27           2       False   True              1             6  43.833333   
28           0        True   True              0             1   7.879200   
29           0        True   True              0             1   7.895800   
...        ...         ...    ...            ...           ...        ...   
1279         0        True   True              0             1   7.750000   
1280         1        True   True              1             5   4.215000   
1281         1       False   True              1             4  23.375000   
1282         1       False  False              1             2  19.700000   
1283         1       False   True              1             3   6.750000   
1284         1       False  False              0             1  10.500000   
1285         1        True   True              1             3   7.341667   
1286         1       False   True              1             2  30.000000   
1287         0        True   True              0             1   7.250000   
1288         1       False  False              1             2  39.600000   
1289         0        True   True              0             1   7.775000   
1290         0       False  False              0             1   7.733300   
1291         2        True   True              1             4  41.216675   
1292         1       False  False              1             2  10.500000   
1293         1       False   True              1             2  29.700000   
1294         1       False   True              1             2  23.550000   
1295         1       False  False              0             1  27.720800   
1296         1       False   True              0             1  13.862500   
1297         1       False   True              0             1  10.500000   
1298         2       False  False              1             5  42.300000   
1299         0        True   True              0             1   7.720800   
1300         1       False   True              1             3   4.591667   
1301         0        True   True              0             1   7.750000   
1302         1       False  False              1             3  30.000000   
1303         0        True   True              0             1   7.775000   
1304         0        True   True              0             1   8.050000   
1305         2       False  False              1             3  36.300000   
1306         0       False  False              0             1   7.250000   
1307         0        True   True              0             1   8.050000   
1308         1       False   True              1             3   7.452767   

      Fare_eff_cat  
0                0  
1                2  
2                0  
3                2  
4                0  
5                0  
6                2  
7                0  
8                0  
9                1  
10               0  
11               2  
12               0  
13               0  
14               0  
15               1  
16               0  
17               1  
18               1  
19               0  
20               1  
21               1  
22               0  
23               2  
24               0  
25               0  
26               0  
27               2  
28               0  
29               0  
...            ...  
1279             0  
1280             0  
1281             2  
1282             2  
1283             0  
1284             1  
1285             0  
1286             2  
1287             0  
1288             2  
1289             0  
1290             0  
1291             2  
1292             1  
1293             2  
1294             2  
1295             2  
1296             1  
1297             1  
1298             2  
1299             0  
1300             0  
1301             0  
1302             2  
1303             0  
1304             0  
1305             2  
1306             0  
1307             0  
1308             0  

[1309 rows x 38 columns]



In [39]:

    
#pfunc.pltCorrel( combined )
#pfunc.pltCorrel( full )
#pfunc.pltCorrel( full )

Correlations to Investigate

Pclass is correlated to Fare ( 1st class tickets would be more expensive than other classes )

Pclass x Age

SibSp X Age

SibSp x Fare

SibSp is correlate to Parch ( large families would have high values of parents aboard and solo travellers would have zero parents aboard )

Pclass noticeable correlates to Survived ( Expected correlation with higher classes to survive as known )



In [40]:

    
# Plot distributions of Age of passangers who survived or did not survive
#pfunc.pltDistro( train , var = 'Age' , target = 'Survived' , row = 'Sex' )



In [41]:

    
# Plot distributions of Fare of passangers who survived or did not survive
#pfunc.pltDistro( train , var = 'Survived' , target = 'Pclass' , row = 'Sex' )



In [42]:

    
# Plot distributions of Parch of passangers who survived or did not survive
#pfunc.pltDistro( train , var = 'Parch' , target = 'Survived' , row = 'Sex' )



In [43]:

    
full.head(5)









    Out[43]:







  
    
      
      Age
      Cabin
      Embarked
      Fare
      Name
      Parch
      PassengerId
      Pclass
      SibSp
      Survived
      ...
      FamilyLarge
      TicketType
      Title
      Fare_cat
      Bad_ticket
      Young
      Shared_ticket
      Ticket_group
      Fare_eff
      Fare_eff_cat
    
  
  
    
      0
      22.0
      U
      S
      7.2500
      Braund, Mr. Owen Harris
      0
      1
      3
      1
      0.0
      ...
      0
      A
      Mr
      0
      True
      True
      0
      1
      7.25000
      0
    
    
      1
      38.0
      C
      C
      71.2833
      Cumings, Mrs. John Bradley (Florence Briggs Th...
      0
      2
      1
      1
      1.0
      ...
      0
      P
      Mrs
      1
      False
      False
      1
      2
      35.64165
      2
    
    
      2
      26.0
      U
      S
      7.9250
      Heikkinen, Miss. Laina
      0
      3
      3
      0
      1.0
      ...
      0
      S
      Miss
      0
      False
      True
      0
      1
      7.92500
      0
    
    
      3
      35.0
      C
      S
      53.1000
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      0
      4
      1
      1
      1.0
      ...
      0
      1
      Mrs
      1
      False
      False
      1
      2
      26.55000
      2
    
    
      4
      35.0
      U
      S
      8.0500
      Allen, Mr. William Henry
      0
      5
      3
      0
      0.0
      ...
      0
      3
      Mr
      0
      True
      False
      0
      1
      8.05000
      0
    
  

5 rows × 38 columns



In [49]:

    
# Plot distributions of Age of passangers who survived or did not survive

#pfunc.pltCategories( train , cat = 'Embarked' , target = 'Survived' ) 
#pfunc.pltCategories( train , cat = 'Pclass' , target = 'Survived' )
#pfunc.pltCategories( train , cat = 'Sex' , target = 'Survived' )
#pfunc.pltCategories( train , cat = 'Parch' , target = 'Survived' )
#pfunc.pltCategories( train , cat = 'SibSp' , target = 'Survived' )
#pfunc.pltDistro( train , var = 'Age' , target = 'Survived' , row = 'Sex' )
full = full.drop('Survived',1)



In [ ]:

    
def getTitles(dframe):
    dframe['Title'] = dframe['Name'].map(lambda name:name.split(',')[1].split('.')[0].strip())
    myDict = {	"Capt":       "Officer", 
    "Col":        "Officer",
    "Major":      "Officer",
    "Dr":         "Officer",
    "Rev":        "Officer",
    "Lady" :      "Royalty",
    "Jonkheer":   "Royalty",
    "Don":        "Royalty",
    "Sir" :       "Royalty",
    "the Countess":"Royalty",
    "Dona":       "Royalty",
    "Mme":        "Mrs",
    "Mlle":       "Miss",
    "Ms":         "Mrs",
    "Mr" :        "Mr",
    "Mrs" :       "Mrs",
    "Miss" :      "Miss",
    "Master" :    "Master"
    }
    
    dframe['Title'] = dframe.Title.map(myDict)
    return dframe



In [57]:

    
full = getTitles(full)
full.head()









    Out[57]:







  
    
      
      Age
      Cabin
      Embarked
      Fare
      Name
      Parch
      PassengerId
      Pclass
      SibSp
      Ticket
      ...
      FamilyLarge
      TicketType
      Title
      Fare_cat
      Bad_ticket
      Young
      Shared_ticket
      Ticket_group
      Fare_eff
      Fare_eff_cat
    
  
  
    
      0
      22.0
      U
      S
      7.2500
      Braund, Mr. Owen Harris
      0
      1
      3
      1
      A/5 21171
      ...
      0
      A
      Mr
      0
      True
      True
      0
      1
      7.25000
      0
    
    
      1
      38.0
      C
      C
      71.2833
      Cumings, Mrs. John Bradley (Florence Briggs Th...
      0
      2
      1
      1
      PC 17599
      ...
      0
      P
      Mrs
      1
      False
      False
      1
      2
      35.64165
      2
    
    
      2
      26.0
      U
      S
      7.9250
      Heikkinen, Miss. Laina
      0
      3
      3
      0
      STON/O2. 3101282
      ...
      0
      S
      Miss
      0
      False
      True
      0
      1
      7.92500
      0
    
    
      3
      35.0
      C
      S
      53.1000
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      0
      4
      1
      1
      113803
      ...
      0
      1
      Mrs
      1
      False
      False
      1
      2
      26.55000
      2
    
    
      4
      35.0
      U
      S
      8.0500
      Allen, Mr. William Henry
      0
      5
      3
      0
      373450
      ...
      0
      3
      Mr
      0
      True
      False
      0
      1
      8.05000
      0
    
  

5 rows × 37 columns



In [56]:

    
# plot functions
import pltFunctions as pfunc
train_X, test_X, target_y = pfunc.prepareTrainTestTarget(full)
#train_valid_X = full[ 0:891 ]
#train_valid_y = full.Survived
#test_X = full[ 891: ]
#train_X , valid_X , train_y , valid_y = train_test_split( train_X , train_valid_y , train_size = .7 )

print (full.shape , train_X.shape , target_y.shape , test_X.shape)









    



(1309, 37) (891, 37) (891,) (418, 37)



In [51]:

    
model = RandomForestClassifier(n_estimators=100)
#model = SVC()
model.fit( train_X , target_y )









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-51-ebc6e6e342d3> in <module>()
      1 model = RandomForestClassifier(n_estimators=100)
      2 #model = SVC()
----> 3 model.fit( train_X , target_y )

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py in fit(self, X, y, sample_weight)
    245         """
    246         # Validate or convert input data
--> 247         X = check_array(X, accept_sparse="csc", dtype=DTYPE)
    248         y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None)
    249         if issparse(X):

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    380                                       force_all_finite)
    381     else:
--> 382         array = np.array(array, dtype=dtype, order=order, copy=copy)
    383 
    384         if ensure_2d:

ValueError: could not convert string to float: 'Mr'



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:

	Age	Cabin	Embarked	Fare	Name	Parch	PassengerId	Pclass	Sex	SibSp	Survived	Ticket
0	22.0	NaN	S	7.2500	Braund, Mr. Owen Harris	0	1	3	male	1	0.0	A/5 21171
1	38.0	C85	C	71.2833	Cumings, Mrs. John Bradley (Florence Briggs Th...	0	2	1	female	1	1.0	PC 17599
2	26.0	NaN	S	7.9250	Heikkinen, Miss. Laina	0	3	3	female	0	1.0	STON/O2. 3101282
3	35.0	C123	S	53.1000	Futrelle, Mrs. Jacques Heath (Lily May Peel)	0	4	1	female	1	1.0	113803
4	35.0	NaN	S	8.0500	Allen, Mr. William Henry	0	5	3	male	0	0.0	373450
5	NaN	NaN	Q	8.4583	Moran, Mr. James	0	6	3	male	0	0.0	330877
6	54.0	E46	S	51.8625	McCarthy, Mr. Timothy J	0	7	1	male	0	0.0	17463
7	2.0	NaN	S	21.0750	Palsson, Master. Gosta Leonard	1	8	3	male	3	0.0	349909
8	27.0	NaN	S	11.1333	Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)	2	9	3	female	0	1.0	347742
9	14.0	NaN	C	30.0708	Nasser, Mrs. Nicholas (Adele Achem)	0	10	2	female	1	1.0	237736

	Age	Cabin	Embarked	Fare	Name	Parch	PassengerId	Pclass	Sex	SibSp	...	Ticket	Cabin_A	Cabin_B	Cabin_C	Cabin_D	Cabin_E	Cabin_F	Cabin_G	Cabin_T	Cabin_U
0	22.000000	U	S	7.2500	Braund, Mr. Owen Harris	0	1	3	male	1	...	A/5 21171	0	0	0	0	0	0	0	0	1
1	38.000000	C	C	71.2833	Cumings, Mrs. John Bradley (Florence Briggs Th...	0	2	1	female	1	...	PC 17599	0	0	1	0	0	0	0	0	0
2	26.000000	U	S	7.9250	Heikkinen, Miss. Laina	0	3	3	female	0	...	STON/O2. 3101282	0	0	0	0	0	0	0	0	1
3	35.000000	C	S	53.1000	Futrelle, Mrs. Jacques Heath (Lily May Peel)	0	4	1	female	1	...	113803	0	0	1	0	0	0	0	0	0
4	35.000000	U	S	8.0500	Allen, Mr. William Henry	0	5	3	male	0	...	373450	0	0	0	0	0	0	0	0	1
5	29.881138	U	Q	8.4583	Moran, Mr. James	0	6	3	male	0	...	330877	0	0	0	0	0	0	0	0	1
6	54.000000	E	S	51.8625	McCarthy, Mr. Timothy J	0	7	1	male	0	...	17463	0	0	0	0	1	0	0	0	0
7	2.000000	U	S	21.0750	Palsson, Master. Gosta Leonard	1	8	3	male	3	...	349909	0	0	0	0	0	0	0	0	1
8	27.000000	U	S	11.1333	Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)	2	9	3	female	0	...	347742	0	0	0	0	0	0	0	0	1
9	14.000000	U	C	30.0708	Nasser, Mrs. Nicholas (Adele Achem)	0	10	2	female	1	...	237736	0	0	0	0	0	0	0	0	1
10	4.000000	G	S	16.7000	Sandstrom, Miss. Marguerite Rut	1	11	3	female	1	...	PP 9549	0	0	0	0	0	0	1	0	0
11	58.000000	C	S	26.5500	Bonnell, Miss. Elizabeth	0	12	1	female	0	...	113783	0	0	1	0	0	0	0	0	0
12	20.000000	U	S	8.0500	Saundercock, Mr. William Henry	0	13	3	male	0	...	A/5. 2151	0	0	0	0	0	0	0	0	1
13	39.000000	U	S	31.2750	Andersson, Mr. Anders Johan	5	14	3	male	1	...	347082	0	0	0	0	0	0	0	0	1
14	14.000000	U	S	7.8542	Vestrom, Miss. Hulda Amanda Adolfina	0	15	3	female	0	...	350406	0	0	0	0	0	0	0	0	1
15	55.000000	U	S	16.0000	Hewlett, Mrs. (Mary D Kingcome)	0	16	2	female	0	...	248706	0	0	0	0	0	0	0	0	1
16	2.000000	U	Q	29.1250	Rice, Master. Eugene	1	17	3	male	4	...	382652	0	0	0	0	0	0	0	0	1
17	29.881138	U	S	13.0000	Williams, Mr. Charles Eugene	0	18	2	male	0	...	244373	0	0	0	0	0	0	0	0	1
18	31.000000	U	S	18.0000	Vander Planke, Mrs. Julius (Emelia Maria Vande...	0	19	3	female	1	...	345763	0	0	0	0	0	0	0	0	1
19	29.881138	U	C	7.2250	Masselmani, Mrs. Fatima	0	20	3	female	0	...	2649	0	0	0	0	0	0	0	0	1
20	35.000000	U	S	26.0000	Fynney, Mr. Joseph J	0	21	2	male	0	...	239865	0	0	0	0	0	0	0	0	1
21	34.000000	D	S	13.0000	Beesley, Mr. Lawrence	0	22	2	male	0	...	248698	0	0	0	1	0	0	0	0	0
22	15.000000	U	Q	8.0292	McGowan, Miss. Anna "Annie"	0	23	3	female	0	...	330923	0	0	0	0	0	0	0	0	1
23	28.000000	A	S	35.5000	Sloper, Mr. William Thompson	0	24	1	male	0	...	113788	1	0	0	0	0	0	0	0	0
24	8.000000	U	S	21.0750	Palsson, Miss. Torborg Danira	1	25	3	female	3	...	349909	0	0	0	0	0	0	0	0	1
25	38.000000	U	S	31.3875	Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...	5	26	3	female	1	...	347077	0	0	0	0	0	0	0	0	1
26	29.881138	U	C	7.2250	Emir, Mr. Farred Chehab	0	27	3	male	0	...	2631	0	0	0	0	0	0	0	0	1
27	19.000000	C	S	263.0000	Fortune, Mr. Charles Alexander	2	28	1	male	3	...	19950	0	0	1	0	0	0	0	0	0
28	29.881138	U	Q	7.8792	O'Dwyer, Miss. Ellen "Nellie"	0	29	3	female	0	...	330959	0	0	0	0	0	0	0	0	1
29	29.881138	U	S	7.8958	Todoroff, Mr. Lalio	0	30	3	male	0	...	349216	0	0	0	0	0	0	0	0	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1279	21.000000	U	Q	7.7500	Canavan, Mr. Patrick	0	1280	3	male	0	...	364858	0	0	0	0	0	0	0	0	1
1280	6.000000	U	S	21.0750	Palsson, Master. Paul Folke	1	1281	3	male	3	...	349909	0	0	0	0	0	0	0	0	1
1281	23.000000	B	S	93.5000	Payne, Mr. Vivian Ponsonby	0	1282	1	male	0	...	12749	0	1	0	0	0	0	0	0	0
1282	51.000000	D	S	39.4000	Lines, Mrs. Ernest H (Elizabeth Lindsey James)	1	1283	1	female	0	...	PC 17592	0	0	0	1	0	0	0	0	0
1283	13.000000	U	S	20.2500	Abbott, Master. Eugene Joseph	2	1284	3	male	0	...	C.A. 2673	0	0	0	0	0	0	0	0	1
1284	47.000000	U	S	10.5000	Gilbert, Mr. William	0	1285	2	male	0	...	C.A. 30769	0	0	0	0	0	0	0	0	1
1285	29.000000	U	S	22.0250	Kink-Heilmann, Mr. Anton	1	1286	3	male	3	...	315153	0	0	0	0	0	0	0	0	1
1286	18.000000	C	S	60.0000	Smith, Mrs. Lucien Philip (Mary Eloise Hughes)	0	1287	1	female	1	...	13695	0	0	1	0	0	0	0	0	0
1287	24.000000	U	Q	7.2500	Colbert, Mr. Patrick	0	1288	3	male	0	...	371109	0	0	0	0	0	0	0	0	1
1288	48.000000	B	C	79.2000	Frolicher-Stehli, Mrs. Maxmillian (Margaretha ...	1	1289	1	female	1	...	13567	0	1	0	0	0	0	0	0	0
1289	22.000000	U	S	7.7750	Larsson-Rondberg, Mr. Edvard A	0	1290	3	male	0	...	347065	0	0	0	0	0	0	0	0	1
1290	31.000000	U	Q	7.7333	Conlon, Mr. Thomas Henry	0	1291	3	male	0	...	21332	0	0	0	0	0	0	0	0	1
1291	30.000000	C	S	164.8667	Bonnell, Miss. Caroline	0	1292	1	female	0	...	36928	0	0	1	0	0	0	0	0	0
1292	38.000000	U	S	21.0000	Gale, Mr. Harry	0	1293	2	male	1	...	28664	0	0	0	0	0	0	0	0	1
1293	22.000000	U	C	59.4000	Gibson, Miss. Dorothy Winifred	1	1294	1	female	0	...	112378	0	0	0	0	0	0	0	0	1
1294	17.000000	U	S	47.1000	Carrau, Mr. Jose Pedro	0	1295	1	male	0	...	113059	0	0	0	0	0	0	0	0	1
1295	43.000000	D	C	27.7208	Frauenthal, Mr. Isaac Gerald	0	1296	1	male	1	...	17765	0	0	0	1	0	0	0	0	0
1296	20.000000	D	C	13.8625	Nourney, Mr. Alfred (Baron von Drachstedt")"	0	1297	2	male	0	...	SC/PARIS 2166	0	0	0	1	0	0	0	0	0
1297	23.000000	U	S	10.5000	Ware, Mr. William Jeffery	0	1298	2	male	1	...	28666	0	0	0	0	0	0	0	0	1
1298	50.000000	C	C	211.5000	Widener, Mr. George Dunton	1	1299	1	male	1	...	113503	0	0	1	0	0	0	0	0	0
1299	29.881138	U	Q	7.7208	Riordan, Miss. Johanna Hannah""	0	1300	3	female	0	...	334915	0	0	0	0	0	0	0	0	1
1300	3.000000	U	S	13.7750	Peacock, Miss. Treasteall	1	1301	3	female	1	...	SOTON/O.Q. 3101315	0	0	0	0	0	0	0	0	1
1301	29.881138	U	Q	7.7500	Naughton, Miss. Hannah	0	1302	3	female	0	...	365237	0	0	0	0	0	0	0	0	1
1302	37.000000	C	Q	90.0000	Minahan, Mrs. William Edward (Lillian E Thorpe)	0	1303	1	female	1	...	19928	0	0	1	0	0	0	0	0	0
1303	28.000000	U	S	7.7750	Henriksson, Miss. Jenny Lovisa	0	1304	3	female	0	...	347086	0	0	0	0	0	0	0	0	1
1304	29.881138	U	S	8.0500	Spector, Mr. Woolf	0	1305	3	male	0	...	A.5. 3236	0	0	0	0	0	0	0	0	1
1305	39.000000	C	C	108.9000	Oliva y Ocana, Dona. Fermina	0	1306	1	female	0	...	PC 17758	0	0	1	0	0	0	0	0	0
1306	38.500000	U	S	7.2500	Saether, Mr. Simon Sivertsen	0	1307	3	male	0	...	SOTON/O.Q. 3101262	0	0	0	0	0	0	0	0	1
1307	29.881138	U	S	8.0500	Ware, Mr. Frederick	0	1308	3	male	0	...	359309	0	0	0	0	0	0	0	0	1
1308	29.881138	U	C	22.3583	Peter, Master. Michael J	1	1309	3	male	1	...	2668	0	0	0	0	0	0	0	0	1